Classifying drug-induced webs using Topological Data Analysis

Authors
Affiliation

Guilherme Vituri F. Pinto

Unesp

Telmo

??????

Published

February 6, 2026

Abstract

We studied etc etc etc

Keywords

Topological Data Analysis, Persistent homology

1 Setup

1.1 Imports

The WebIO Jupyter extension was not detected. See the WebIO Jupyter integration documentation for more information.

Source: Article Notebook

2 Dataset Description

2.1 Sample Composition

This analysis uses N=25 spider web images collected from spiders exposed to different agricultural chemicals:

Group N Drug Type Mechanism of Action
CONTROL 5 None Baseline web structure
CIPERMETRINA 5 Insecticide (Pyrethroid) Synthetic pyrethroid; disrupts sodium channels causing tremors and impaired motor control
ENDOSULFAN 5 Insecticide (Organochlorine) GABA antagonist; causes seizures and neurological disruption (banned in many countries)
GLIFOSATO 5 Herbicide (Glyphosate) Glycine analog; disputed neurotoxicity in arthropods
SPINOSAD 5 Insecticide (Organic) Bacterial metabolite; nicotinic acetylcholine agonist causing paralysis
Missing Metadata

The original dataset lacks several critical experimental details:

  • Spider species identification
  • Drug dosages and exposure protocols
  • Spider age, sex, and size
  • Environmental conditions (temperature, humidity, light)
  • Time post-exposure when webs were photographed

This limits biological interpretation and generalizability of findings.

2.2 Biological Context: Why Study Spider Webs?

Spider web construction is a sensitive bioassay for neurotoxicity. Building a web requires:

  1. Precise motor control: Accurate silk placement and anchor point selection
  2. Spatial memory: Following geometric patterns and maintaining symmetry
  3. Proprioception: Body position awareness during construction

Drugs affecting the nervous system disrupt these processes, manifesting as structural changes visible in the web geometry.

2.2.1 Historical Precedent

Spider web pharmacology dates to 1948 (Witt et al.): drugs like caffeine, LSD, and marijuana produce characteristic web deformations. Modern applications include:

  • Environmental toxicology monitoring
  • Pesticide safety assessment
  • Neurological drug screening

2.2.2 Expected Drug Effects

Based on mechanism of action:

  • Pyrethroids (CIPERMETRINA): Sodium channel disruption → tremors → irregular silk placement
  • Organochlorines (ENDOSULFAN): GABA antagonist → seizures → chaotic structures
  • Glyphosate (GLIFOSATO): Herbicide with disputed neurotoxicity → unclear effect expected
  • Spinosad: Nicotinic agonist → paralysis → incomplete or simplified webs

2.3 Why TDA for This Problem?

Traditional image analysis methods (edge detection, Fourier analysis, texture features) struggle with spider webs because:

  1. Geometric irregularity: Webs don’t follow rigid templates
  2. Scale variation: Cell sizes vary across the web
  3. Partial structures: Incomplete or torn webs

TDA Advantages:

  • Topological invariance: Robust to rotation, scaling, and small perturbations
  • Multi-scale analysis: Persistence diagrams capture features at all scales simultaneously
  • Interpretable features: H0 = fragmentation, H1 = loop structure (cells/meshes)
  • No template required: Data-driven rather than model-based
TDA Hypothesis

We hypothesize that drug-induced neurological impairment will manifest as:

  • H1 features (closed loops) decreasing under drugs that impair motor coordination
  • H0 features (fragmentation) increasing if drugs cause severe behavioral disruption
  • Persistence entropy decreasing under drugs that produce irregular cell patterns

Topological features should provide more sensitive detection than simple metrics like web area or thread count.

3 Data Loading

3.1 Load Images

Groups: SubString{String}["CIPERMETRINA", "CONTROL", "ENDOSULFAN", "GLIFOSATO", "SPINOSAD"]
Samples per group:
  CIPERMETRINA: 5
  CONTROL: 5
  ENDOSULFAN: 5
  GLIFOSATO: 5
  SPINOSAD: 5

3.2 Sample Size and Statistical Power

Statistical Power Considerations

Dataset Size: N=25 total (5 samples per group)

This sample size has important implications for statistical inference:

3.2.1 Detection Limits

With N=5 per group, we can reliably detect only large effect sizes (Cohen’s |d| > 0.8):

  • Small effects (|d| < 0.5): Very low power (~20-30%)
  • Medium effects (|d| = 0.5-0.8): Moderate power (~40-60%)
  • Large effects (|d| > 0.8): Adequate power (~70-80%)

3.2.2 Statistical Concerns

  1. Wide confidence intervals: Effect size estimates are imprecise
  2. High variance: Cross-validation results show substantial standard deviations
  3. No independent validation: All results use LOOCV on the same 25 samples
  4. Overfitting risk: Classification models may capture sample-specific noise

3.2.3 Study Interpretation

Given these limitations, this analysis should be interpreted as:

Proof-of-concept demonstrating TDA methodology ✓ Exploratory analysis generating hypotheses ✓ Method validation showing feasibility

NOT definitive biological conclusions ✗ NOT generalizable without replication ✗ NOT powered for detecting subtle effects

3.2.4 Recommendations for Future Work

  • Minimal viable study: N ≥ 20 per group
  • Well-powered study: N ≥ 30 per group
  • Independent validation cohort: 70/30 train/test split or multi-site data

Results presented here represent upper bounds on performance (likely optimistic due to overfitting) and should guide future adequately-powered studies.

3.3 Point Cloud Sampling

We extract 1000 points from each web using the farthest point sample algorithm.

4 Persistence Diagrams

4.1 Compute Rips Filtration

4.2 Extract H0 and H1

Number of samples: 25
H0 diagram example (first non-empty): 
  Sample 1: 1000 features
H1 diagram example (first non-empty): 
  Sample 1: 71 features

4.3 Representative Web Examples

Before diving into feature extraction and statistics, let’s visualize representative spider webs from each treatment group along with their persistence diagrams. Each panel shows:

  • Top left: Persistence diagram (birth-death plot showing H1 cycles)
  • Top right: Web image intensity heatmap
  • Bottom: Point cloud sample used for TDA computation

5 Feature Extraction

Understanding TDA Features

We extract several statistical summaries from each persistence diagram. Here’s what each feature measures and how it relates to web structure:

Feature What it measures Web interpretation
n_features Number of H1 cycles detected Number of closed cells in the web
total_persistence Sum of all cycle lifespans Overall topological complexity
median_persistence Median cycle lifespan Typical cell “robustness” (robust to outliers)
max_persistence Largest cycle lifespan Most prominent hole or cell
entropy Uniformity of cycle lifespans High = regular cells; Low = irregular cells
median_birth Median scale at which cycles appear Typical cell size (robust to outliers)

These features transform complex persistence diagrams into interpretable numbers that can be compared statistically across treatment groups.

5.1 Rich Statistics - H1 (Cycles)

25×6 DataFrame
Row Specie n_features total_persistence median_persistence max_persistence entropy
SubStrin… Int64 Float64 Float64 Float64 Float64
1 CIPERMETRINA 71 831.438 9.631 66.449 4.079
2 CIPERMETRINA 69 779.229 8.354 37.007 4.076
3 CIPERMETRINA 66 925.464 11.833 58.847 4.033
4 CIPERMETRINA 69 777.478 10.289 28.721 4.138
5 CIPERMETRINA 50 499.91 8.61 22.183 3.811
6 CONTROL 68 473.123 6.0 19.114 4.159
7 CONTROL 81 577.74 6.403 14.111 4.357
8 CONTROL 138 1086.26 6.606 38.022 4.836
9 CONTROL 108 764.702 6.43 16.632 4.643
10 CONTROL 93 626.066 6.275 12.905 4.507
11 ENDOSULFAN 87 648.155 6.298 31.075 4.379
12 ENDOSULFAN 68 543.676 6.945 22.366 4.145
13 ENDOSULFAN 80 812.353 6.842 80.652 4.111
14 ENDOSULFAN 59 610.397 6.896 58.875 3.785
15 ENDOSULFAN 61 616.653 8.022 55.703 3.941
16 GLIFOSATO 60 965.556 10.522 107.758 3.752
17 GLIFOSATO 43 695.187 10.05 60.592 3.474
18 GLIFOSATO 72 868.896 8.868 79.585 4.049
19 GLIFOSATO 54 695.542 9.893 68.599 3.765
20 GLIFOSATO 67 866.073 9.56 51.756 4.025
21 SPINOSAD 70 793.129 9.768 37.691 4.093
22 SPINOSAD 69 827.155 10.814 53.565 4.099
23 SPINOSAD 72 780.85 8.964 38.603 4.137
24 SPINOSAD 64 747.34 9.938 27.148 4.04
25 SPINOSAD 78 831.255 9.219 57.754 4.216

5.2 H0 (Connected Components) - Not Analyzed

Why H0 is Not Analyzed

All spider webs in this dataset remain structurally connected (single component), meaning H0 persistence provides minimal discriminatory information between treatment groups.

The absence of web fragmentation suggests:

  • Spiders complete web construction despite drug exposure
  • Drug effects manifest primarily as topological changes within connected structures (H1 features)
  • Changes in loop/cell patterns (H1) rather than complete structural breakdown

Therefore, we focus our analysis on H1 (one-dimensional persistence) which captures the relevant differences in cell structure and regularity.

5.3 Feature Matrices

H1 features: [:n_features, :total_persistence, :median_persistence, :std_persistence, :max_persistence, :q25, :q50, :q75, :q90, :entropy, :median_birth, :birth_range]
Feature matrix size: (25, 12)

5.4 Vectorized Diagram Features

Vectorized features dimension: 362

6 Exploratory Visualization

6.1 Summary Statistics by Drug

Mean Statistics by Group:

CIPERMETRINA:
  - Mean cycles (H1): 65.0
  - Mean entropy: 4.027
  - Mean max persistence: 42.642

CONTROL:
  - Mean cycles (H1): 97.6
  - Mean entropy: 4.5
  - Mean max persistence: 20.157

ENDOSULFAN:
  - Mean cycles (H1): 71.0
  - Mean entropy: 4.072
  - Mean max persistence: 49.734

GLIFOSATO:
  - Mean cycles (H1): 59.2
  - Mean entropy: 3.813
  - Mean max persistence: 73.658

SPINOSAD:
  - Mean cycles (H1): 70.6
  - Mean entropy: 4.117
  - Mean max persistence: 42.952

6.2 Betti Curves by Drug

6.3 Average Persistence Images

6.4 Within-Group Variability

Some drug groups show more heterogeneity in web structure than others. Below we show the most and least complex webs (by entropy) within each group to illustrate this variability. High entropy indicates regular, uniform cell sizes; low entropy indicates irregular cells.

6.5 Feature Distributions by Group

The boxplots below show how key TDA features are distributed across treatment groups. This helps visualize group differences before formal statistical testing.

Figure 1: Distribution of key TDA features across treatment groups

7 Distance Analysis

The Wasserstein distance (also called Earth Mover’s Distance) measures how different two persistence diagrams are.

Intuition: Imagine each point in a persistence diagram as a pile of dirt. The Wasserstein distance is the minimum “work” needed to transform one diagram into another by moving dirt around.

Why use it for TDA?

  • Specifically designed for comparing persistence diagrams
  • Captures both the locations of topological features and how they should be matched
  • Has metric properties, enabling use with standard machine learning methods (like KNN)

Notation: Wasserstein(p, q) — we use p=1 (sum of movements) and q=2 (Euclidean ground metric).

A small Wasserstein distance means two webs have similar topological structure; a large distance means their persistence diagrams differ substantially.

MDS converts a distance matrix into low-dimensional coordinates for visualization:

  1. Start with pairwise distances between all samples
  2. Find 2D or 3D coordinates that preserve these distances as well as possible
  3. Plot the coordinates — samples close together have similar features

How to interpret MDS plots:

  • Clusters = groups of samples with similar topological features
  • Separation between clusters = distinct TDA signatures between groups
  • Overlap between groups = ambiguity; these groups are hard to distinguish topologically

MDS is purely for visualization — it doesn’t make statistical claims, but helps us see patterns before formal testing.

7.1 Wasserstein Distance Matrix

25×25 Matrix{Float64}:
   0.0    125.421  149.505   149.996  …  161.391  207.911  158.323  263.38
 125.421    0.0    153.59    135.848     157.259  215.999  133.712  267.336
 149.505  153.59     0.0     182.613     151.695  251.085  160.943  336.517
 149.996  135.848  182.613     0.0       105.588  144.571  101.393  181.717
 266.832  230.788  308.53    212.951     244.25   213.088  184.152  254.225
 299.746  276.566  391.579   242.585  …  306.842  278.482  255.82   263.543
 315.641  293.247  406.906   282.731     338.736  337.27   299.099  289.382
 600.395  578.767  652.597   637.794     641.536  694.576  651.218  747.219
 416.68   389.453  509.701   398.214     456.635  462.956  411.355  407.379
 385.303  360.164  482.388   347.347     412.4    399.277  363.692  338.669
 309.979  283.324  393.786   278.028  …  320.495  319.393  302.347  303.891
 244.195  225.081  334.331   179.349     241.556  202.346  194.103  214.974
 159.134  194.802  257.867   252.589     249.588  264.4    267.983  318.443
 246.778  232.008  299.076   274.348     253.482  255.08   254.396  379.716
 173.598  196.275  247.833   213.122     197.099  210.929  185.743  274.036
 282.592  300.674  272.063   344.175  …  282.728  327.157  303.107  433.23
 397.027  351.363  382.757   400.73      370.031  361.843  377.011  451.677
 128.08   172.289  184.227   192.1       154.853  187.414  197.367  299.317
 325.133  307.064  329.953   287.327     271.733  231.725  283.73   356.261
 141.482  133.883   96.7042  178.781     134.393  235.819  150.86   327.573
 144.326  106.155  184.168   105.203  …  137.01   171.337  131.314  215.534
 161.391  157.259  151.695   105.588       0.0    163.167  127.829  253.459
 207.911  215.999  251.085   144.571     163.167    0.0    165.233  211.368
 158.323  133.712  160.943   101.393     127.829  165.233    0.0    211.254
 263.38   267.336  336.517   181.717     253.459  211.368  211.254    0.0

7.2 Distance Metric Comparison: Wasserstein vs Bottleneck

Different distance metrics capture different aspects of topological dissimilarity. We compare two fundamental persistence diagram distances:

Wasserstein vs Bottleneck: Theoretical Differences
Property Wasserstein W₁ Bottleneck d∞
Definition Optimal matching cost (total transport) Worst-case matching cost (max single distance)
Formula Sum of all point distances in optimal matching Maximum single point distance in optimal matching
Sensitivity Sensitive to all points (global measure) Dominated by outliers (local measure)
Stability More stable in presence of noise Can be unstable with outliers
Interpretation “Average structural difference” “Maximum local difference”
Computation O(n³) via Hungarian algorithm O(n^2.5) via min-cost flow

When to use which? - Wasserstein: When all topological features matter; captures overall structural difference - Bottleneck: When largest discrepancy matters; robust to small noise but sensitive to big changes

7.2.1 Compute Both Distance Matrices

Computing Wasserstein distance matrix...
Computing Bottleneck distance matrix...

Distance matrix statistics:
Wasserstein - Min: 87.923, Max: 799.563, Mean: 296.694
Bottleneck  - Min: 3.207, Max: 53.879, Mean: 24.31

7.2.2 Compare Distance Distributions


Correlation between distance metrics: 0.209

7.2.3 Classification Performance Comparison


=== Classification Accuracy Comparison ===
Wasserstein W₁: 44.0%
Bottleneck d∞:  20.0%

⇒ Wasserstein OUTPERFORMS Bottleneck
  Global structure more informative than local extrema

7.2.4 Visualize Distance Matrices

7.2.5 Interpretation


=== Distance Metric Analysis ===

✗ Low correlation (r = 0.21)
  Metrics capture fundamentally different structures
  Choice significantly impacts conclusions

Recommendation:
→ Use Wasserstein distance for this dataset
  Better classification performance
  Captures overall structural differences relevant to drug effects

7.3 Euclidean Distance on Rich Stats

7.4 MDS Embeddings

8 Statistical Tests

Our statistical analysis follows a three-stage approach:

  1. Omnibus test (Kruskal-Wallis): Do ANY groups differ from each other?
  2. Pairwise comparisons (Permutation tests): WHICH drugs differ from control?
  3. Effect sizes (Cohen’s d): HOW MUCH do they differ?

This hierarchical approach controls false positives while providing interpretable effect magnitudes.

The Kruskal-Wallis test is a non-parametric alternative to one-way ANOVA. We use it here because:

  1. No normality assumption: Unlike ANOVA, it doesn’t require the data to follow a normal distribution — important for TDA features which may have unusual distributions
  2. Robust to outliers: Uses ranks instead of raw values, so extreme points don’t dominate
  3. Works with small samples: Reliable even with limited data per group

How to interpret the p-value:

  • p < 0.05: Strong evidence that at least one group differs from the others (marked with *)
  • p ≥ 0.05: Insufficient evidence to conclude groups differ

Why not use ANOVA? With small sample sizes and potentially non-normal distributions (common in TDA features), Kruskal-Wallis is more reliable and makes fewer assumptions.

8.1 Kruskal-Wallis Tests

Kruskal-Wallis Tests for Group Differences:

entropy: p = 0.004 *
n_features: p = 0.0495 *
max_persistence: p = 0.0123 *
total_persistence: p = 0.2011 

A p-value tells you if groups differ statistically, but effect size tells you how much they differ in practical terms.

Cohen’s d measures the standardized difference between two group means:

\[d = \frac{\bar{x}_1 - \bar{x}_2}{s_{pooled}}\]

where \(s_{pooled}\) is the pooled standard deviation of both groups.

Interpretation guidelines:

d value
< 0.2 Negligible Groups nearly identical
0.2 – 0.5 Small Detectable but minor difference
0.5 – 0.8 Medium Noticeable practical difference
> 0.8 Large Substantial, meaningful difference

Why effect size matters: With large samples, even tiny differences can be “statistically significant” (p < 0.05) but practically meaningless. Effect size helps distinguish meaningful differences from trivial ones.

A permutation test is a non-parametric method to compute p-values without assuming any particular distribution:

  1. Calculate the observed difference between groups (e.g., difference in mean entropy)
  2. Randomly shuffle group labels many times (e.g., 10,000 permutations)
  3. Recalculate the difference after each shuffle
  4. Count how often the shuffled difference exceeds the observed difference
  5. p-value = (count + 1) / (n_permutations + 1)

Advantages:

  • No distributional assumptions — works for any data
  • Works with any test statistic
  • Provides exact p-values even for small samples
  • Intuitive interpretation: “how often would we see this difference by chance?”

Used here: We compare each drug group to CONTROL using permutation tests to get reliable p-values.

A Note on Multiple Comparisons

When we test multiple features across multiple drug groups, we increase the chance of false positives. With 4 drugs × 3 features = 12 tests at α = 0.05, we expect about 0.6 false positives by chance alone.

Recommendations for interpreting results:

  • Focus on results with p < 0.01 (more stringent threshold)
  • Prioritize findings with large effect sizes (|d| > 0.8)
  • Look for consistent patterns across related features (e.g., both entropy and n_cycles showing similar direction)

Results that meet multiple criteria (low p-value AND large effect size AND consistent pattern) are most reliable.

8.2 Pairwise Drug Comparisons with Effect Sizes

12×6 DataFrame
Row drug feature diff_pct cohens_d effect_size p_value
SubStrin… String Float64 Float64 String Float64
1 CIPERMETRINA entropy -10.5 -2.31 large 0.0077
2 ENDOSULFAN entropy -9.5 -1.76 large 0.0304
3 GLIFOSATO entropy -15.3 -2.77 large 0.0029
4 SPINOSAD entropy -8.5 -2.02 large 0.0158
5 CIPERMETRINA n_cycles -33.4 -1.63 large 0.0294
6 ENDOSULFAN n_cycles -27.3 -1.27 large 0.0616
7 GLIFOSATO n_cycles -39.3 -1.86 large 0.0174
8 SPINOSAD n_cycles -27.7 -1.39 large 0.0462
9 CIPERMETRINA max_persistence 111.5 1.46 large 0.0613
10 ENDOSULFAN max_persistence 146.7 1.64 large 0.0382
11 GLIFOSATO max_persistence 265.4 3.16 large 0.0028
12 SPINOSAD max_persistence 113.1 1.99 large 0.0202
Multiple comparison correction:
  Number of tests: 12
  Bonferroni-corrected α = 0.0042
  Comparisons significant after correction: 2

9 Classification

Beyond hypothesis testing, we can ask: can we automatically identify which drug a spider was exposed to based on its web’s topological features? This is a classification task.

KNN is one of the simplest classification algorithms. To classify a new sample:

  1. Compute the distance from the new sample to all training samples
  2. Find the k nearest neighbors (k closest training samples)
  3. Assign the majority class among those neighbors

Key parameter: k (number of neighbors)

  • Small k (e.g., k=1 or k=3): More sensitive to local patterns, but also to noise
  • Large k (e.g., k=10+): More robust, but may miss subtle differences

We use k=3 as a balanced choice that captures local structure without being overly sensitive to outliers.

Distance metric matters: We test both Wasserstein distance (comparing persistence diagrams directly) and Euclidean distance (comparing extracted features).

The problem: If we train and test on the same data, we get overly optimistic accuracy because the model has “seen” the answers. We need to estimate performance on unseen data.

9.0.1 Leave-One-Out Cross-Validation (LOOCV)

  1. Remove one sample from the dataset
  2. Train the model on the remaining n-1 samples
  3. Predict the class of the held-out sample
  4. Repeat for every sample
  5. Accuracy = proportion of correct predictions

Pros: Uses maximum training data; deterministic (same result every time) Cons: Computationally expensive; can have high variance

9.0.2 K-Fold Cross-Validation

  1. Split data into k equal folds (e.g., k=5)
  2. For each fold: train on k-1 folds, test on the remaining fold
  3. Average accuracy across all folds

Pros: Good balance of bias and variance; faster than LOOCV Cons: Results vary slightly depending on random split (we report mean ± std)

9.0.3 Interpreting Results

  • LOOCV accuracy: Single number, deterministic
  • K-fold accuracy: Reported as mean ± standard deviation
  • Higher accuracy = better classification; >50% for 5 classes means better than random guessing (20%)

Beyond overall accuracy, we report per-class metrics:

Metric What it measures Formula
Precision Of samples predicted as class X, how many are truly X? TP / (TP + FP)
Recall Of samples truly in class X, how many did we identify? TP / (TP + FN)
F1 Score Harmonic mean of precision and recall 2 × (P × R) / (P + R)

Interpreting the confusion matrix:

  • Diagonal elements: Correct predictions (true positives for each class)
  • Off-diagonal elements: Errors — reading row i, column j means “sample truly in class i was predicted as class j”
  • A perfect classifier has all counts on the diagonal

9.1 KNN with Wasserstein Distance (LOOCV)

Accuracy (KNN Wasserstein k=3): 44.0%

Per-class metrics:
  CIPERMETRINA: precision=0.0, recall=0.0, f1=0.0
  CONTROL: precision=0.67, recall=0.8, f1=0.73
  ENDOSULFAN: precision=0.33, recall=0.2, f1=0.25
  GLIFOSATO: precision=0.67, recall=0.4, f1=0.5
  SPINOSAD: precision=0.44, recall=0.8, f1=0.57

9.2 KNN on Vectorized Features (5-fold CV)

Accuracy (KNN vectorized, 5-fold): 28.0% +/- 22.8%

9.2.1 Confusion Matrix: Vectorized Features (LOOCV)

To understand which classes are confused, we run LOOCV to get predictions:

LOOCV Accuracy: 24.0%

Per-class metrics (Vectorized Features):
  CIPERMETRINA: precision=0.0, recall=0.0, f1=0.0
  CONTROL: precision=0.0, recall=0.0, f1=0.0
  ENDOSULFAN: precision=0.6, recall=0.6, f1=0.6
  GLIFOSATO: precision=0.33, recall=0.2, f1=0.25
  SPINOSAD: precision=0.2, recall=0.4, f1=0.27

9.3 KNN on Rich Stats (5-fold CV)

Accuracy (KNN rich stats, 5-fold): 40.0% +/- 14.1%

9.3.1 Confusion Matrix: Rich Stats (LOOCV)

LOOCV Accuracy: 36.0%

Per-class metrics (Rich Stats):
  CIPERMETRINA: precision=0.0, recall=0.0, f1=0.0
  CONTROL: precision=0.67, recall=0.8, f1=0.73
  ENDOSULFAN: precision=0.5, recall=0.4, f1=0.44
  GLIFOSATO: precision=0.67, recall=0.4, f1=0.5
  SPINOSAD: precision=0.17, recall=0.2, f1=0.18

9.4 Classification Comparison

=== Classification Methods Comparison ===

1. KNN Wasserstein (k=3, LOOCV):        44.0%
2. KNN Vectorized Features (k=3, 5-fold): 28.0% +/- 22.8%
3. KNN Rich Stats (k=3, 5-fold):         40.0% +/- 14.1%

10 Method Comparison: TDA vs Traditional Approaches

To validate that TDA provides unique value beyond mathematical sophistication, we compare against traditional image analysis methods.

10.1 Why Compare Methods?

This comparison answers a critical question for expert reviewers: “Why use TDA when simpler methods might work?”

We test three alternative approaches: 1. PCA on raw pixels: Dimensionality reduction on flattened images 2. Handcrafted features: Domain-informed image statistics 3. TDA methods: Our topological approach (for reference)

What This Comparison Reveals
  • If TDA outperforms: Topological structure captures information that pixel-level methods miss
  • If alternatives perform similarly: TDA may be unnecessarily complex for this problem
  • If PCA dominates: Raw pixel patterns sufficient; topology adds little value

This provides empirical justification (or refutation) of the TDA methodology choice.

10.2 Alternative Feature Extraction

10.2.1 Method 1: PCA on Raw Pixels

Extracting raw pixel features...
Raw pixel matrix size: (25, 10000)
  (25 samples × 10000 pixels)

Applying PCA...
PCA with 20 components:
  Variance explained: 90.0%
  Reduced dimensions: 20

10.2.2 Method 2: Handcrafted Image Features

Extracting handcrafted features...
Handcrafted feature matrix size: (25, 8)
  Features: ["mean_intensity", "std_intensity", "max_intensity", "edge_strength", "center_dist", "spread_y", "spread_x", "density"]
0-element view(::Vector{Float64}, Int64[]) with eltype Float64

10.3 Classification Performance Comparison


=== Method Comparison Results ===

Method                        | Features | Accuracy (Mean ± SD)      | Range
--------------------------------------------------------------------------------
Handcrafted Features          |        8 | 48.0% ± 11.0% | [40.0%, 60.0%]
TDA: Rich Stats (H1)          |       12 | 40.0% ± 14.1% | [20.0%, 60.0%]
TDA: Vectorized Diagram       |      362 | 28.0% ± 22.8% | [0.0%, 60.0%]
PCA on Pixels (20 comp)       |       20 | 16.0% ±  8.9% | [0.0%, 20.0%]

10.4 Interpretation


=== Key Findings ===

Best TDA method: TDA: Rich Stats (H1)
  Accuracy: 40.0% ± 14.1%

Best alternative: Handcrafted Features
  Accuracy: 48.0% ± 11.0%

TDA advantage: -8.0 percentage points

⚠ Traditional methods OUTPERFORM TDA
  Consider simpler approaches for this problem

10.5 Strengths and Weaknesses of Each Approach

Method Strengths Weaknesses Best Use Case
TDA Rich Stats • Interpretable features
• Topologically invariant
• Few features (low overfitting)
• Loses spatial info
• Requires TDA expertise
Small samples, need interpretability
TDA Vectorized • Captures full diagram
• Multi-scale information
• High-dimensional
• Harder to interpret
Large samples, complex structure
PCA on Pixels • Simple baseline
• No domain knowledge needed
• Sensitive to rotation/translation
• High-dimensional input
Quick baseline, large datasets
Handcrafted Features • Fast computation
• Domain-informed
• Requires expert feature engineering
• May miss subtle patterns
When domain knowledge available
Why This Matters for Expert Review

Comparing methods demonstrates that:

  1. TDA is not arbitrary: Empirical evidence shows whether topology adds value
  2. Interpretability vs accuracy trade-off: TDA rich stats offer interpretable features with competitive accuracy
  3. Small sample robustness: With N=25, lower-dimensional TDA features (12 dims) may generalize better than high-dimensional pixel features (10,000 dims → 20 PCA components)

This comparison strengthens the methodological contribution by showing TDA provides unique value rather than just mathematical sophistication.

11 Feature Importance

12×2 DataFrame
Row feature importance
Symbol Float64
1 birth_range 0.098
2 std_persistence 0.082
3 q75 0.074
4 q90 0.054
5 entropy 0.05
6 median_birth 0.04
7 n_features 0.022
8 total_persistence 0.022
9 q25 0.008
10 max_persistence -0.0
11 median_persistence -0.022
12 q50 -0.044

12 Biological Interpretation

12.1 Feature Meaning

Dimension Web Structure Interpretation
H0 (components) Disconnected fragments More H0 = broken/fragmented web
H1 (loops/cycles) Closed cells/meshes More H1 = more closed cells
Entropy H1 Cell uniformity High entropy = regular cells
Max persistence H1 Largest hole/gap High = large gap in web

12.2 Drug Effects Summary

Drug Effects Compared to CONTROL:

CONTROL baseline - Entropy: 4.5, H1 count: 97.6

CIPERMETRINA:
  Entropy: 4.027 (-10.5%)
  H1 count: 65.0 (-33.4%)
  Effect: Fewer closed cells, More irregular cells

ENDOSULFAN:
  Entropy: 4.072 (-9.5%)
  H1 count: 71.0 (-27.3%)
  Effect: Fewer closed cells

GLIFOSATO:
  Entropy: 3.813 (-15.3%)
  H1 count: 59.2 (-39.3%)
  Effect: Fewer closed cells, More irregular cells

SPINOSAD:
  Entropy: 4.117 (-8.5%)
  H1 count: 70.6 (-27.7%)
  Effect: Fewer closed cells

13 Enhanced Separability Analysis

This section provides rigorous statistical evidence for two key hypotheses:

  1. CONTROL is clearly separable from all drug-treated groups
  2. Drug classes are NOT easily separable from each other

13.1 Distance Combination

We combine Wasserstein distance (topological structure) with Euclidean distance (rich statistics features) to potentially improve classification.

=== Distance Combination Optimization ===
Best alpha: 0.5
Best accuracy: 44.0%

Interpretation:
  alpha = 1.0 means pure Wasserstein distance
  alpha = 0.0 means pure Euclidean (rich stats) distance
=== Classification Accuracy Comparison ===
Wasserstein only:  44.0%
Euclidean only:    36.0%
Combined (α=0.5): 44.0%

13.2 Binary Classification: Control vs Drug

Collapsing all drugs into a single “DRUG” class tests whether CONTROL can be clearly distinguished from treated webs.

=== Binary Classification: CONTROL vs DRUG ===
Accuracy: 88.0%
95% CI: [75.9%, 100.0%]
Sensitivity (Control recall): 80.0%
Specificity (Drug recall): 90.0%

13.2.1 ROC Curve Analysis

The ROC curve shows how well we can detect CONTROL samples using distance to the Control centroid.

ROC AUC: 0.955

Interpretation:
  AUC > 0.9: Excellent discrimination
  AUC 0.8-0.9: Good discrimination
  AUC 0.7-0.8: Fair discrimination

13.3 Separability Metrics

13.3.1 Within-Class vs Between-Class Distance Ratios

A lower ratio indicates better class separation. Ratios above 0.8 suggest overlapping classes.

=== Within/Between Distance Ratios ===
Full 5-class:        0.739 - moderately separated
Binary (Ctrl/Drug):  0.607 - moderately separated
Drugs only (4-class): 0.839 - overlapping

13.3.2 Silhouette Score Analysis

Silhouette scores measure how well-defined each cluster is. Higher is better: - > 0.5: Good separation - 0.25-0.5: Weak separation - < 0.25: Poor separation (overlapping)

=== Silhouette Scores by Class ===
Overall mean: -0.016

SPINOSAD: 0.024 (poor)
CIPERMETRINA: -0.027 (poor)
CONTROL: -0.013 (poor)
ENDOSULFAN: 0.001 (poor)
GLIFOSATO: -0.067 (poor)

13.3.3 Pairwise Group Distances

15×5 DataFrame
Row group1 group2 mean_distance std_distance n_pairs
String String Float64 Float64 Int64
1 CIPERMETRINA CIPERMETRINA 191.607 59.6837 20
2 CIPERMETRINA CONTROL 403.084 135.293 25
3 CIPERMETRINA ENDOSULFAN 244.223 61.5255 25
4 CIPERMETRINA GLIFOSATO 264.904 90.2555 25
5 CIPERMETRINA SPINOSAD 186.761 60.8101 25
6 CONTROL CONTROL 322.777 202.577 20
7 CONTROL ENDOSULFAN 324.871 164.328 25
8 CONTROL GLIFOSATO 537.583 136.117 25
9 CONTROL SPINOSAD 409.455 147.488 25
10 ENDOSULFAN ENDOSULFAN 223.484 37.978 20
11 ENDOSULFAN GLIFOSATO 325.272 88.6281 25
12 ENDOSULFAN SPINOSAD 253.009 49.6141 25
13 GLIFOSATO GLIFOSATO 276.809 62.9035 20
14 GLIFOSATO SPINOSAD 282.145 90.0662 25
15 SPINOSAD SPINOSAD 178.751 41.1936 20

13.4 PERMANOVA Tests

PERMANOVA tests whether group centroids differ significantly in multivariate space. It works directly on the Wasserstein distance matrix.

13.4.1 Control vs Drugs

=== PERMANOVA: Control vs Drugs ===
Pseudo-F: 10.81
p-value: 0.0001

✓ CONTROL centroid significantly differs from DRUG centroid (p < 0.05)

13.4.2 Drug Equivalence Test

Testing whether drug groups differ from each other (excluding CONTROL).

=== PERMANOVA: Among Drugs Only ===
Pseudo-F: 3.25
p-value: 0.0001

Interpretation: Some drug differences detected

13.4.3 Pairwise Drug Comparisons

Testing each pair of drugs to see if they can be statistically distinguished.

=== Pairwise Drug Permutation Tests (Entropy) ===
6×6 DataFrame
Row drug1 drug2 mean_diff p_value significant interpretation
String String Float64 Float64 Bool String
1 CIPERMETRINA ENDOSULFAN 0.0450259 0.70443 false NOT distinguishable
2 CIPERMETRINA GLIFOSATO 0.214446 0.060494 false NOT distinguishable
3 CIPERMETRINA SPINOSAD 0.089826 0.19618 false NOT distinguishable
4 ENDOSULFAN GLIFOSATO 0.259471 0.105689 false NOT distinguishable
5 ENDOSULFAN SPINOSAD 0.0448001 0.716228 false NOT distinguishable
6 GLIFOSATO SPINOSAD 0.304272 0.00939906 true distinguishable

13.5 Confusion Analysis

Which classes are most often confused with each other?

=== Top Confusion Pairs ===
10×4 DataFrame
Row true_class predicted_class confusion_rate count
String String Float64 Int64
1 CIPERMETRINA SPINOSAD 60.0 3
2 ENDOSULFAN CIPERMETRINA 40.0 2
3 ENDOSULFAN CONTROL 40.0 2
4 GLIFOSATO SPINOSAD 40.0 2
5 CIPERMETRINA ENDOSULFAN 20.0 1
6 CIPERMETRINA GLIFOSATO 20.0 1
7 CONTROL ENDOSULFAN 20.0 1
8 GLIFOSATO CIPERMETRINA 20.0 1
9 SPINOSAD CIPERMETRINA 20.0 1
10 CIPERMETRINA CONTROL 0.0 0

13.6 Summary: Separability Evidence

============================================================
SEPARABILITY ANALYSIS SUMMARY
============================================================

### Evidence that CONTROL is SEPARABLE ###
Binary classification accuracy: 88.0%
ROC AUC: 0.955
PERMANOVA (Ctrl vs Drugs) p-value: 0.0001
Control silhouette score: -0.013

Conclusion: ✓ CONTROL IS CLEARLY SEPARABLE

### Evidence that DRUGS are NOT separable ###
Drug-only PERMANOVA p-value: 0.0001
Drugs-only within/between ratio: 0.839
Mean drug silhouette: -0.017

Conclusion: ✓ DRUGS ARE NOT EASILY SEPARABLE

============================================================

14 Limitations and Future Directions

14.1 Methodological Limitations

14.1.1 Sample Size

N=5 per group is insufficient for robust statistical inference

  • Effect size estimates have very wide confidence intervals
  • High risk of Type II error (missing true effects)
  • Classification accuracy likely overestimated due to overfitting
  • Permutation tests have limited precision with small sample sizes

Impact: Results should be viewed as exploratory and hypothesis-generating, not confirmatory.

14.1.2 Parameter Selection

All preprocessing and TDA parameters were chosen heuristically without systematic optimization:

Parameter Value Used Justification
blur 2 Heuristic choice (not optimized)
threshold 0.1 Visual inspection (not data-driven)
sample_size 1000 points Computational convenience (not validated)
rips_cutoff 5 Arbitrary choice (no sensitivity analysis shown)
k (KNN) 3 Standard default (not tuned)
Wasserstein (p=1, q=2) Not compared to alternatives

Impact: Results may be sensitive to these choices. A systematic sensitivity analysis would strengthen conclusions (recommended for future work).

14.1.3 Validation Strategy

No independent validation dataset

  • All reported accuracies use cross-validation on the same 25 samples
  • High risk of overfitting to sample-specific patterns
  • True generalization performance likely lower than reported

Recommended validation hierarchy (for future studies): 1. Level 1: Internal validation (current LOOCV) 2. Level 2: Temporal validation (same cohort, different timepoints) 3. Level 3: External validation (independent lab, different spiders)

14.2 Biological Limitations

14.2.1 Uncontrolled Confounders

The original dataset lacks metadata for critical experimental variables:

  • Spider biology: Species, age, sex, size not recorded
  • Drug protocol: Dosages, exposure duration, administration method unknown
  • Environmental conditions: Temperature, humidity, light not controlled
  • Web collection: Time post-exposure unclear; web completeness varies

Impact: Observed differences could reflect confounding variables rather than drug effects alone.

14.2.2 Mechanism Unclear

TDA detects structural differences but doesn’t explain why:

  • H1 features (loops/cells) may reflect motor control, silk production, cognitive effects, or combinations
  • Different drugs with different mechanisms show similar topological signatures
  • Requires neurobiology/toxicology expertise for causal interpretation

14.2.3 Generalizability Unknown

  • Results specific to one spider species (unidentified in dataset)
  • Drug effects may vary across species, life stages, dosages
  • Environmental context (lab vs. field) not specified
  • Replication on independent datasets essential

14.3 Statistical Concerns

14.3.1 Multiple Testing

  • 12+ statistical tests conducted without strict family-wise error rate control
  • Bonferroni correction mentioned (α = 0.004) but not consistently applied
  • With small N, correction further reduces power

Approach taken: Report raw p-values; prioritize effect sizes and consistency across tests

14.3.2 Cross-Validation Variance

  • LOOCV on N=25 has high variance
  • K-fold CV (k=5) results show wide standard deviations
  • Single train/test split would be even less reliable given small N

14.3.3 Distance Metric Choice

  • Wasserstein distance chosen arbitrarily
  • No comparison to Bottleneck distance or other metrics
  • Different metrics may yield different classification results

14.4 What This Study IS and IS NOT

14.4.1 ✓ What This Study IS

  1. Proof-of-concept demonstrating TDA’s applicability to toxicological screening
  2. Methodological contribution showing topological features capture web structure
  3. Hypothesis-generating exploratory analysis identifying promising features
  4. Reproducible pipeline with documented code and methods

14.4.2 ✗ What This Study is NOT

  1. NOT definitive biological conclusions about drug effects
  2. NOT generalizable beyond this specific dataset
  3. NOT adequately powered for detecting small-to-medium effects
  4. NOT validated on independent data

14.5 Strengths Despite Limitations

Despite small sample size and methodological constraints, this work demonstrates:

  1. Novel application: First TDA analysis of drug-induced spider webs
  2. Clear CONTROL separation: Binary classification (88% accuracy) suggests drugs do affect web topology
  3. Reproducible methods: All code and analysis fully documented
  4. Transparent limitations: Honest acknowledgment of constraints builds trust

14.6 Future Directions

14.6.1 Immediate Improvements (Next Study)

  1. Increase sample size: Target N ≥ 20 per group for adequate power
  2. Independent validation: Collect hold-out test set (70/30 train/test split)
  3. Systematic parameter tuning: Grid search with nested cross-validation
  4. Controlled experiments: Standardize spider species, age, drug dosage, exposure time

14.6.2 Advanced Methodology

  1. Comparison to baselines: Test against CNN classifiers and traditional image features
  2. Multi-scale analysis: Vary filtration parameters systematically
  3. Statistical topology methods: Use recent advances for inference on persistence diagrams
  4. Temporal dynamics: If possible, track same spiders building multiple webs

14.6.3 Biological Integration

  1. Dose-response curves: Test multiple drug concentrations
  2. Behavioral correlates: Link topological features to specific motor/cognitive deficits
  3. Multi-species comparison: Test generalizability across spider families
  4. Mechanism investigation: Use neurobiology techniques to validate TDA findings

14.7 Conclusions with Appropriate Caveats

This exploratory analysis demonstrates that:

  1. TDA captures drug effects: CONTROL webs are topologically distinguishable from drug-treated webs
  2. H1 features are informative: Entropy, cycle count, and persistence metrics vary systematically
  3. Drug classes overlap: Different drugs produce similar topological signatures, suggesting common disruption pathways
  4. Method shows promise: TDA provides interpretable, geometrically-motivated features

However, with N=5 per group and no independent validation, these findings require: - Replication on larger, independent datasets - Systematic parameter validation - Controlled experimental conditions - Biological mechanism investigation

This work is best viewed as methodological proof-of-concept rather than definitive toxicological findings.

Citation

BibTeX citation:
@online{vituri_f._pinto2026,
  author = {Vituri F. Pinto, Guilherme and , Telmo and , ??????},
  title = {Classifying Drug-Induced Webs Using {Topological} {Data}
    {Analysis}},
  date = {2026-02-06},
  langid = {en},
  abstract = {We studied etc etc etc}
}
For attribution, please cite this work as:
Vituri F. Pinto, Guilherme, Telmo, and ?????? 2026. “Classifying Drug-Induced Webs Using Topological Data Analysis.” Earth and Space Science. February 6, 2026.